Exploring SVI, TRI, and Health Outcomes

Author

Joshua Brinks

Overview

In this lesson, we will introduce you to 3 public health and air quality datasets. These include the EPA’s Toxic Release Inventory (TRI), the EPA’s Integrated Compliance Information System for Air (ICIS-AIR), and the CDC’s PLACES health outcomes dataset. You will learn how to form queries and retrieve each dataset from their respective API, process the returned object into a useable format, create visualizations and perform simple analysis. You will also use NASA’s Social Vulnerability Index (SVI) to examine socioeconomic patterns in the greater Detroit metropolitan area and explore relationships with public health.

Programming Review

This lesson uses the Python programming environment.

Learning Objectives

After completing this lesson, you should be able to:

  • Retrieve multiple public health datasets through the EPA and CDC’s APIs.
  • Create multipanel raster maps from multilayered SVI data.
  • Geolocating tabular data using latitude and longitude.
  • Create maps joining geolocated points and basemaps.
  • Create maps with graduated point symbols for point source pollution releases.
  • Interpolate a raster surface from point spatial data.
  • Perform raster math to create specialized indexes.
  • Perform spatial correlation analysis.

Introduction

In many urban areas, air quality issues disproportionately affect low-income communities and communities of color. This disparity is a key focus of environmental justice efforts. In cities like Detroit, Chicago, and Cleveland industrial facilities, highways, and other pollution sources are often concentrated in disadvantaged neighborhoods. Detroit has a history of industrial pollution and is working to address air quality issues in areas like Southwest Detroit, where residents face higher exposure to pollutants from nearby industries and heavy truck traffic. Similarly, Chicago has seen community efforts to address air pollution in areas like the Southeast Side, where residents have fought against polluting industries and advocated for stricter regulations.

The EPA is the primary federal agency responsible for environmental protection in the United States. It sets and enforces air quality standards, including the National Ambient Air Quality Standards (NAAQS) for six criteria pollutants. The EPA also maintains the AirNow system, which provides real-time air quality data to the public. CDC (Centers for Disease Control and Prevention): While primarily focused on public health, the CDC plays a crucial role in understanding the health impacts of air pollution. It conducts research on the relationships between air quality and various health outcomes, and provides guidance on protecting public health from environmental hazards.

In response to these challenges, community-driven science initiatives have emerged. These efforts involve local residents in collecting data on air quality and other environmental factors, often using low-cost sensors and mobile monitoring techniques. This approach helps fill gaps in official monitoring networks and empowers communities to advocate for themselves. Open data is crucial for community-driven solutions in several ways:

  • Transparency: Open data allows communities to verify official reports and hold authorities accountable.
  • Accessibility: When air quality data is freely available, communities can use it to inform local decision-making and advocacy efforts.
  • Innovation: Open data enables researchers, activists, and tech developers to create new tools and analyses that can benefit communities.
  • Collaboration: Open data facilitates collaboration between communities, scientists, and policymakers, leading to more effective solutions.

While the EPA and CDC provide federal networks of open-access datasets like the Air Quality System (AQS), TRI, ICIS-AIR, and PLACES collections, community driven data collection is a large part of driving change in traditionally underrepresented communities. In Chicago, the Array of Things project has installed sensors throughout the city, providing open data on various environmental factors including air quality, while Detroit’s Community Air Monitoring Project uses low-cost sensors to collect and share air quality data in areas underserved by official monitoring stations.

With access to open data, communities can:

  • Identify local air quality hotspots that may be missed by sparse official monitoring networks.
  • Correlate air quality data with health outcomes to strengthen advocacy efforts.
  • Develop targeted interventions, such as promoting indoor air filtration on high-pollution days.
  • Create custom alerts and information systems tailored to local needs.

Although open data provides many benefits, challenges still remain. These include: 1) ensuring data quality and consistency, especially when integrating data from various sources; 2) addressing the “digital divide” to ensure all community members can access and use the data; 3) balancing the need for detailed local data with privacy concerns; and 4) building capacity within communities to effectively use and interpret complex environmental data.

The state of Michigan, Department of Environment, Great Lakes, and Energy (EGLE), and the Office of the Environmental Justice Public Advocate are one of the most active state departments aiming to address issues of the environment, public health, and social justice. Here are some of the projects enacted in greater Detroit area:

  • Detroit Environmental Agenda: This community-led initiative works closely with EGLE to address environmental concerns in Detroit. It focuses on issues like air quality, water quality, and waste management.
  • 48217 Community Air Monitoring Project: Named after the zip code of a heavily industrialized area in Southwest Detroit, this project involves community members working with EGLE to monitor air quality using low-cost sensors.
  • Detroit Climate Action Plan: Developed in partnership with EGLE, this plan addresses climate change impacts on the city, with a focus on vulnerable communities.
  • Delray Neighborhood Initiatives: EGLE has been involved in efforts to address air quality concerns in the Delray neighborhood, which is impacted by industrial emissions and heavy truck traffic.
  • Green Door Initiative: This Detroit-based organization collaborates with EGLE on various environmental justice projects, including lead abatement and air quality improvement efforts.
  • Detroit River Sediment Cleanup: EGLE has been involved in efforts to clean up contaminated sediments in the Detroit River, which disproportionately affects nearby low-income communities.
  • Asthma Prevention Programs: EGLE supports community-based asthma prevention programs in Detroit, recognizing the link between air quality and asthma rates in disadvantaged neighborhoods.

These are just a few of examples that demonstrate the ongoing collaboration between community groups, local government, and EGLE to address environmental justice concerns in Detroit.

Air quality issues in urban areas disproportionately affect low-income communities and communities of color, making it a critical focus for environmental justice efforts. Federal agencies like the EPA and CDC play crucial roles in setting standards and conducting research, but community-driven science initiatives have emerged as powerful tools for local action. Open data is key to these efforts, enabling transparency, accessibility, innovation, and collaboration. Cities like Detroit and Chicago are at the forefront of these initiatives, with projects that empower residents to monitor and advocate for better air quality. While challenges remain, particularly in data management and accessibility, the collaboration between community groups, local governments, and state agencies like Michigan’s EGLE demonstrates a promising path forward.

Data Analysis and Exercises

To better understand and address these complex issues, we’ll work through a series of coding and data science exercises. These hands-on activities will allow users to explore some of the open datasets and tools that are available to community members and stakeholders that offer practical insights into air quality, public health, and social vulnerability.

We’ll begin with the ICIS-AIR dataset, then move on to the TRI Facility dataset, and finally work our way down to TRI Form A. After exploring these 3 air pollution datasets, we will take a look at the CDC PLACES health outcomes data and perform some simple analysis integrating the pollution and health outcomes data.

Data Science Review

This lesson uses the pandas, geopandas, matplotlib, numpy, requests, contextily, pygris, rasterio, and xarray modules. If you’d like to learn more about the functions used in this lesson, you can refer to the documentation on their respective websites.

The pandas module is essential for data manipulation and analysis, while geopandas extends its functionality to handle geospatial data. matplotlib is used for creating static, animated, and interactive visualizations. numpy provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.

The requests module allows you to send HTTP requests and interact with APIs easily. contextily adds basemaps to your plots, enhancing the visual context of your geospatial data. pygris simplifies the process of working with US Census Bureau TIGER/Line shapefiles.

rasterio and xarray are used for working with geospatial raster data. rasterio reads and writes geospatial raster datasets, while xarray introduces labels in the form of dimensions, coordinates, and attributes on top of raw NumPy-like arrays, making it easier to work with labeled multi-dimensional arrays.

Make sure these modules are installed before you begin working with the code in this document. You can install them using pip:

ICIS Air

ICIS-AIR (Integrated Compliance Information System for Air) is a comprehensive database maintained by the U.S. Environmental Protection Agency (EPA) that focuses specifically on air quality compliance and enforcement data. It tracks information related to stationary sources of air pollution, including their compliance with various Clean Air Act regulations, permit data, and enforcement actions. While ICIS-AIR and the Toxic Release Inventory (TRI) are separate systems, they have significant overlap in their coverage of industrial facilities and air emissions. Many facilities that report to TRI also have data in ICIS-AIR, providing complementary information.

Where TRI focuses on the quantities of specific toxic chemicals released into the air, ICIS-AIR offers a broader picture of a facility’s overall air quality compliance status, including non-toxic pollutants regulated under the Clean Air Act. Together, these systems provide a more comprehensive view of industrial air pollution sources, enabling researchers, regulators, and the public to assess both the types and quantities of air pollutants emitted (from TRI) and the regulatory compliance status of the emitting facilities (from ICIS-AIR).

Data Processing

Begin by importing the required Python modules.

import xarray as xr
import rasterio
import rasterio.mask
from rasterio.enums import Resampling
from scipy import stats
from rasterio.transform import from_origin
from matplotlib.colors import BoundaryNorm, ListedColormap
import geopandas as gpd
import matplotlib.pyplot as plt
import pygris
import numpy as np
from shapely.geometry import box
import pandas as pd
import requests
import contextily as ctx
import time
import rioxarray
Defining Our Study Area

Next we’ll query the API for ICIS-AIR regulated facilities in the greater Detroit area, but first we need to create a spatial object that defines our study area that we can use throughout this analysis. The U.S. Census Bureau defines the Detroit, MI Metropolitan Statistical Area (MSA) as including 6 counties (Wayne, Macomb, Oakland, Lapeer, Livingston, and St. Clair), but for this lesson we will focus on the core 3 counties (Wayne, Macomb, and Oakland); both population and air emissions related facilities are much more sparse in the outer counties.

We can us pygris to get vector boundaries for the counties and dissolve them into a single boundary we can you for cropping data and restricting API searches.

counties = ['Wayne', 'Oakland', 'Macomb']

# Fetch Detroit metro area counties using pygris
metro_counties = pygris.counties(state="MI", year=2022)
detroit_metro = metro_counties[metro_counties['NAME'].isin(counties)]

# Dissolve the counties into a single polygon
detroit_metro = detroit_metro.dissolve(by='STATEFP')

# Convert to GeoDataFrame
detroit_metro = gpd.GeoDataFrame(detroit_metro, geometry='geometry', crs='EPSG:4269')

# Get the bounding box
bbox = detroit_metro.total_bounds

# Create the bounding box as a polygon
bbox_polygon = gpd.GeoDataFrame(
    geometry=[box(*bbox)],
    crs=detroit_metro.crs
)
Using FIPS code '26' for input 'MI'

Let’s verify this by overlaying it on a basemap.

# Create the plot
fig, ax = plt.subplots(figsize=(7.5, 7.5))

# Reproject data to Web Mercator to match the basemap
detroit_metro_bm = detroit_metro.to_crs(epsg=3857)
bbox_polygon_bm = bbox_polygon.to_crs(epsg=3857)

# Plot the metro area and bounding box
detroit_metro_bm.plot(ax=ax, color='lightblue', edgecolor='darkblue', alpha=0.3)
bbox_polygon_bm.boundary.plot(ax=ax, color='grey', linewidth=2)

# Add the basemap
ctx.add_basemap(ax, source=ctx.providers.OpenStreetMap.Mapnik)

# Set the extent of the map to the bounding box
ax.set_xlim(bbox_polygon_bm.total_bounds[0], bbox_polygon_bm.total_bounds[2])
ax.set_ylim(bbox_polygon_bm.total_bounds[1], bbox_polygon_bm.total_bounds[3])

# Remove axes
ax.set_axis_off()

plt.title("Detroit Metro Study Area", fontsize=16)
plt.tight_layout()
plt.show()

This looks correct, and we’ll use this boundary object repeatedly throughout our analysis.

Querying the API

The EPA’s Enforcement and Compliance History Online (ECHO) website serves as a comprehensive portal for environmental compliance and enforcement data. It includes the ECHO Data Service, which provides programmatic access to EPA data through RESTful APIs. Among these is the ICIS-AIR API, which specifically offers access to air quality compliance and enforcement data. This API allows users to retrieve detailed information about stationary sources of air pollution, including facility details, permit data, emissions reports, and compliance status.

The ECHO API requires that users follow a multi-step workflow to acquire data via the Facility Air API:

  1. Use get_facilities to validate parameters, get summary statistics, and obtain a query_id (QID), valid for about 30 minutes.
  2. Use get_qid with the obtained QID to paginate through facility results.
  3. Use get_map with the QID to visualize and navigate facility locations.
  4. Use get_download with the QID to generate a CSV file of facility information.

The get_facility_info endpoint operates independently, returning either clustered summary statistics or an array of individual facilities. This API structure allows for efficient querying, visualization, and extraction of air quality compliance data, making it a valuable resource for environmental analysis and research.

We’ll begin by querying this database for just Michigan facilities in the workflow detailed above. From there we’ll subset with our list of counties.

We start with the base url for the ECHO API and our parameter list.

# Base URL for ECHO ICIS-AIR API
base_url = "https://echodata.epa.gov/echo/air_rest_services"

# Parameters for the initial API call
params = {
    "output": "JSON",
    "p_st": "MI"
}
Data Science Review

An Application Programming Interface (API) is a set of protocols, tools, and definitions that enable different software applications to communicate with each other. It acts as an intermediary, allowing one application to request specific data or actions from another application or service without needing access to its entire codebase. APIs define the methods and data formats that applications can use to request and exchange information, typically through specific endpoints (URLs) that accept requests and return responses in standardized formats like JSON. They are crucial for integrating external services, accessing databases, and enabling communication between different parts of larger open science software systems.

Now we define 2 functions that will carry out the first 2 steps; 1) identifying the facilities in our search and 2) retrieve the facility data.

# Define the function to query the API for the facilities of interest
def get_facilities():
    response = requests.get(f"{base_url}.get_facilities", params=params)
    if response.status_code == 200:
        data = response.json()
        if 'Results' in data:
            qid = data['Results']['QueryID']
            print(f"Query ID: {qid}")
            print(f"Total Facilities: {data['Results']['QueryRows']}")
            return qid
    print("Failed to get facilities and QID")
    return None

# Define the function to retrieve the relevant data from the facilities of interest
def get_facility_data(qid):
    all_facilities = []
    page = 1
    while True:
        params = {"qid": qid, "pageno": page, "output": "JSON"}
        response = requests.get(f"{base_url}.get_qid", params=params)
        if response.status_code == 200:
            data = response.json()
            if 'Results' in data and 'Facilities' in data['Results']:
                facilities = data['Results']['Facilities']
                if not facilities:  # No more facilities to retrieve
                    break
                all_facilities.extend(facilities)
                print(f"Retrieved page {page}")
                page += 1
            else:
                break
        else:
            print(f"Failed to retrieve page {page}")
            break
    return all_facilities

Now that we’ve defined the functions we can make a request to the API with our Detroit metro counties.

# Step 1: Get the Query ID
qid = get_facilities()

if qid:
    # Step 2: Use get_qid to retrieve all facility data
    print("Retrieving facility data...")
    facilities = get_facility_data(qid)
    
    # Convert to DataFrame
    df_icis_air = pd.DataFrame(facilities)
    
    print(f"\nSuccessfully retrieved {len(df_icis_air)} ICIS-AIR facilities for Michigan")
    print("\nColumns in the dataset:")
    print(df_icis_air.columns)
    
    # Display the first few rows
    print("\nFirst few rows of the data:")
    print(df_icis_air.head())
    
    # Save to CSV
else:
    print("Failed to retrieve facility data")
Query ID: 162
Total Facilities: 3395
Retrieving facility data...
Retrieved page 1

Successfully retrieved 3395 ICIS-AIR facilities for Michigan

Columns in the dataset:
Index(['AIRName', 'SourceID', 'AIRStreet', 'AIRCity', 'AIRState',
       'LocalControlRegionCode', 'AIRZip', 'RegistryID', 'AIRCounty',
       'AIREPARegion', 'FacFederalAgencyCode', 'FacFederalAgencyName',
       'FacDerivedHuc', 'FacFIPSCode', 'FacIndianCntryFlg',
       'AIRIndianCntryFlg', 'FacIndianSpatialFlg', 'FacDerivedTribes',
       'FacUsMexBorderFlg', 'FacSICCodes', 'AIRNAICS', 'FacLat', 'FacLong',
       'AIRPrograms', 'AIRMacts', 'AIRStatus', 'AIRUniverse',
       'AIRClassification', 'AIRCmsCategoryCode', 'AIRCmsCategoryDesc',
       'FacDerivedWBD', 'FacDerivedWBDName', 'ChesapeakeBayFlag', 'AIRIDs',
       'CWAIDs', 'RCRAIDs', 'RmpIDs', 'SDWAIDs', 'TRIIDs', 'GHGIDs', 'EisIDs',
       'CamdIDs', 'AIRComplStatus', 'AIRHpvStatus', 'AIRMnthsWithHpv',
       'AIRQtrsWithHpv', 'AIRQtrsWithViol', 'AIRPollRecentViol',
       'AIRRecentViolCnt', 'AIRLastViolDate', 'AIREvalCnt', 'AIRDaysLastEval',
       'AIRLastEvalDate', 'AIRLastEvalDateEPA', 'AIRLastEvalDateState',
       'AIRFceCnt', 'AIRDaysLastFce', 'AIRLastFceDate', 'AIRLastFceDateEPA',
       'AIRLastFceDateState'],
      dtype='object')

First few rows of the data:
                      AIRName            SourceID  \
0        16TH STREET PARTNERS  MI00000000000A5933   
1           2/90 SIGN SYSTEMS  MI00000000000N1511   
2               2600 WBB, LLC  MI00000000000P0376   
3        3M DETROIT ABRASIVES  MI00000000000N2999   
4  42 DEGREES PROCESSING, LLC  MI00000000000P1215   

                        AIRStreet       AIRCity AIRState  \
0                   570 E 16TH ST       HOLLAND       MI   
1  5350 CORPORATE GROVE BOULEVARD  GRAND RAPIDS       MI   
2       2600 WEST BIG BEAVER ROAD          TROY       MI   
3           11900 E EIGHT MILE RD       DETROIT       MI   
4                606 S PARK DRIVE      KALKASKA       MI   

  LocalControlRegionCode AIRZip    RegistryID AIRCounty AIREPARegion  ...  \
0                   None  49423  110009597016    Ottawa           05  ...   
1                   None  49512  110003707944      Kent           05  ...   
2                   None  48084  110056435889   Oakland           05  ...   
3                   None  48205  110000593055     Wayne           05  ...   
4                   None  49646  110071312793  Kalkaska           05  ...   

  AIREvalCnt AIRDaysLastEval AIRLastEvalDate AIRLastEvalDateEPA  \
0          0            None            None               None   
1          2             626      12/14/2022               None   
2          0            3956      11/01/2013               None   
3          2             709      09/22/2022               None   
4          2             793      06/30/2022               None   

  AIRLastEvalDateState AIRFceCnt AIRDaysLastFce AIRLastFceDate  \
0                 None         0           None           None   
1           12/14/2022         1            626     12/14/2022   
2           11/01/2013         0           3956     11/01/2013   
3           09/22/2022         1            709     09/22/2022   
4           06/30/2022         1            793     06/30/2022   

  AIRLastFceDateEPA AIRLastFceDateState  
0              None                None  
1              None          12/14/2022  
2              None          11/01/2013  
3              None          09/22/2022  
4              None          06/30/2022  

[5 rows x 60 columns]

There are over three thousand facilities in the ICIS-AIR Michigan dataset. Here are the descriptions for some key columns.

Key Columns in the ICIS-AIR ECHO API Dataset

  • AIRName: The name of the facility
  • SourceID: A unique identifier for the air pollution source
  • AIRStreet, AIRCity, AIRState, AIRZip: Address components of the facility
  • RegistryID: A unique identifier for the facility in the EPA’s registry
  • AIRCounty: The county where the facility is located
  • AIREPARegion: The EPA region responsible for the facility
  • AIRNAICS: North American Industry Classification System code(s) for the facility
  • FacLat, FacLong: Latitude and longitude coordinates of the facility
  • AIRPrograms: Air quality programs applicable to the facility
  • AIRStatus: Current operational status of the facility
  • AIRUniverse: Categorization of the facility within air quality regulation
  • AIRComplStatus: Current compliance status under the Clean Air Act
  • AIRHpvStatus: High Priority Violator status
  • AIRQtrsWithViol: Number of quarters with violations
  • AIRLastViolDate: Date of the most recent violation
  • AIREvalCnt: Count of evaluations conducted
  • AIRLastEvalDate: Date of the most recent evaluation
  • AIRFceCnt: Count of Full Compliance Evaluations
  • AIRLastFceDate: Date of the last Full Compliance Evaluation

These columns provide information about each facility’s location, operational characteristics, compliance history, and regulatory oversight under the Clean Air Act. It includes information on violations, evaluations, and compliance status that assess a facility’s environmental performance and regulatory compliance.

We can check some basic information like how many facilities are in Detroit metro and the breakdown by our 3 counties.

# Subset the dataframe to include only the Detroit metro counties
icis_air_detroit = df_icis_air[df_icis_air['AIRCounty'].isin(counties)]

# Print information about the subset
print(f"Total ICIS-AIR facilities in Michigan: {len(df_icis_air)}")
print(f"ICIS-AIR facilities in Detroit metro area: {len(icis_air_detroit)}")

# Display the count of facilities in each metro county
print("\nFacilities per county:")
print(icis_air_detroit['AIRCounty'].value_counts())
Total ICIS-AIR facilities in Michigan: 3395
ICIS-AIR facilities in Detroit metro area: 903

Facilities per county:
AIRCounty
Wayne      431
Oakland    272
Macomb     200
Name: count, dtype: int64

Next we should check for facilities with missing latitude or longitude coordinates.

# Count records with missing coordinate values
missing_coords = icis_air_detroit[(icis_air_detroit['FacLat'].isnull()) | (icis_air_detroit['FacLong'].isnull())]
print(f"Number of ICIS-AIR records with missing coordinates: {len(missing_coords)}")

# Remove records with missing coordinates
icis_air_detroit = icis_air_detroit.dropna(subset=['FacLat', 'FacLong'])
Number of ICIS-AIR records with missing coordinates: 0

No missing records! That’s great. As you’ll find out later, you won’t always be so lucky.

Now we can create a spatial object, reproject it so it matches the basemap of Detroit, and make a map showing the location of all the ICIS-AIR facilities in Detroit metro.

# Create a GeoDataFrame for ICIS-AIR facilities
gdf_icis_air = gpd.GeoDataFrame(
    icis_air_detroit, 
    geometry=gpd.points_from_xy(icis_air_detroit.FacLong, icis_air_detroit.FacLat),
    crs="EPSG:4326"
)

# Reproject ICIS-AIR data to Web Mercator so it matches the basemap
gdf_icis_air_bm = gdf_icis_air.to_crs(epsg=3857)

# Create the plot
fig, ax = plt.subplots(figsize=(7.5, 7.5))

# Plot the metro area and bounding box (reusing objects from earlier)
detroit_metro_bm.plot(ax=ax, facecolor='none', edgecolor='blue', linewidth=2)
bbox_polygon_bm.boundary.plot(ax=ax, color='red', linewidth=2)

# Plot ICIS-AIR facilities
gdf_icis_air_bm.plot(ax=ax, color='cyan', markersize=50, alpha=0.5, label='ICIS-AIR Facilities', edgecolor = "grey")

# Add the basemap
ctx.add_basemap(ax, source=ctx.providers.OpenStreetMap.Mapnik)

# Set the extent of the map to the bounding box
ax.set_xlim(bbox_polygon_bm.total_bounds[0], bbox_polygon_bm.total_bounds[2])
ax.set_ylim(bbox_polygon_bm.total_bounds[1], bbox_polygon_bm.total_bounds[3])

# Remove axes
ax.set_axis_off()

# Add legend
ax.legend()

plt.title("Detroit Metro Area ICIS-AIR Facilities", fontsize=16)
plt.tight_layout()
plt.show()

print(f"Number of ICIS-AIR facilities plotted: {len(gdf_icis_air)}")

Number of ICIS-AIR facilities plotted: 903

These are distributed along business and industrial sectors as expected, but we can’t infer much else from this map.

Which regulated facilities have been in violation status? We can plot all the facilities with at least one violation during a reporting quarter and use graduated symbols that reflect the total number of quarters with violations. This information is in the AIRQtrsWithViol column.

# Convert to numeric, replacing non-numeric values with NaN
gdf_icis_air_bm['AIRQtrsWithViol'] = pd.to_numeric(gdf_icis_air_bm['AIRQtrsWithViol'], errors='coerce')

# subset the dataset for those with violations
gdf_icis_air_violators = gdf_icis_air_bm = gdf_icis_air_bm[gdf_icis_air_bm['AIRQtrsWithViol'] > 0]

# Create the plot
fig, ax = plt.subplots(figsize=(7.5, 7.5))

# Plot the metro area and bounding box (reusing objects from earlier)
detroit_metro_bm.plot(ax=ax, facecolor='none', edgecolor='blue', linewidth=2)
bbox_polygon_bm.boundary.plot(ax=ax, color='red', linewidth=2)

# Plot TRI facilities with graduated symbols based on air releases
scatter = ax.scatter(gdf_icis_air_violators.geometry.x,
                     gdf_icis_air_violators.geometry.y, 
                     s=gdf_icis_air_violators['AIRQtrsWithViol']*10,  # Adjust the multiplier as needed
                     c='orangered', 
                     edgecolor='yellow', 
                     linewidth=1, 
                     alpha=0.7)

# Add the basemap
ctx.add_basemap(ax, source=ctx.providers.OpenStreetMap.Mapnik)

# Set the extent of the map to the bounding box
ax.set_xlim(bbox_polygon_bm.total_bounds[0], bbox_polygon_bm.total_bounds[2])
ax.set_ylim(bbox_polygon_bm.total_bounds[1], bbox_polygon_bm.total_bounds[3])

# Remove axes
ax.set_axis_off()

# Add legend
ax.legend()

plt.title("Detroit Metro Area ICIS-AIR Facilities w/ Violations", fontsize=16)
plt.tight_layout()
plt.show()

We can also create a table showing which businesses have been in violation statues, their name, address, and the date of their most recent violation using the tabulate module.

from tabulate import tabulate

gdf_icis_air_violators = gdf_icis_air_bm = gdf_icis_air_bm[gdf_icis_air_bm['AIRQtrsWithViol'] >= 10]

# List the columns we want to use
table_columns = ['AIRName', 'AIRStreet', 'AIRCity', 'AIRQtrsWithViol', 'AIRLastViolDate']

# Create a new DataFrame with only the columns we want
table_data = gdf_icis_air_violators[table_columns].copy()

# Ensure AIRLastViolDate is in datetime format
table_data['AIRLastViolDate'] = pd.to_datetime(table_data['AIRLastViolDate'])

# Sort the DataFrame by AIRLastViolDate in descending order
table_data = table_data.sort_values('AIRLastViolDate', ascending=False)

# Create a dictionary to map old column names to pretty column names
pretty_names = {
    'AIRName': 'Name',
    'AIRStreet': 'Street',
    'AIRCity': 'City',
    'AIRQtrsWithViol': 'Violations',
    'AIRLastViolDate': 'Violation Date'
}

# Rename the columns
table_data.rename(columns=pretty_names, inplace=True)

# Format the 'Violation Date' column to a specific date format (optional)
table_data['Violation Date'] = table_data['Violation Date'].dt.strftime('%Y-%m-%d')

# Create the table
table = tabulate(table_data, headers='keys', tablefmt='html', showindex=False)
table 
Name Street City Violations Violation Date
EES COKE BATTERY L.L.C. 1400 ZUG ISLAND ROAD RIVER ROUGE 12 2024-04-23
ROSATI SPECIALTIES 24200 CAPITAL BLVD. CLINTON TWP 12 2024-03-21
CARMEUSE LIME INC, RIVER ROUGE OPERATION 25 MARION AVE RIVER ROUGE 12 2023-08-17
BASF CORPORATION - CHEMICAL PLANTS 1609 BIDDLE AVE WYANDOTTE 12 2023-03-24
MARATHON PETROLEUM COMPANY LP 1001 S OAKWOOD DETROIT 12 2023-03-15
FCA US LLC - DETROIT ASSEMBLY COMPLEX 2101 CONNER AVE DETROIT 12 2023-01-25
U S STEEL GREAT LAKES WORKS 1 QUALITY DR ECORSE 12 2022-11-07
FCA US LLC WARREN TRUCK ASSEMBLY PLANT 21500 MOUND ROAD WARREN 12 2022-10-21
NYLOK LLC 15260 HALLMARK COURT MACOMB 12 2022-08-29
AJAX METAL PROCESSING INC. 4651 BELLEVUE AVE DETROIT 10 2022-06-07
STERLING ENGINE HOLDINGS LLC 54420 PONTIAC TRAIL MILFORD 12 2021-06-10
DETROIT RENEWABLE POWER, LLC 5700 RUSSELL ST DETROIT 12 2019-08-29
JVISFH, LLC 23944 FREEWAY PARK DRIVE FARMINGTN HLS 12 2019-03-22
LEAR CORPORATION DBA EAGLE OTTAWA 2930 WEST AUBURN RD ROCHESTER HLS 11 2014-11-10
BASF CORPORATION - PLASTIC PLANTS 1609 BIDDLE AVE. WYANDOTTE 12 2014-06-24
WYANDOTTE DEPT MUNI POWER PLANT 2555 VAN ALSTYNE WYANDOTTE 12 2014-04-21
POWER SOLUTIONS INTERNATIONAL 32505 INDUSTRIAL DRIVE MADISON HTS 12 2013-12-09
HENRY FORD WEST BLOOMFIELD HOSPITAL 6777 WEST MAPLE ROAD W BLOOMFIELD 12 2012-05-03
HD INDUSTRIES 19455 GLENDALE DETROIT 12 2012-04-18

ICIS-AIR is a powerfule tool for communites and policy-makers to identify regulated facilities and their compliance status, but the dataset does have limitations. Next, we’ll take a look at facilities required to report to the Toxic Release Inventory (TRI).

Environmental Justice In the News

The Michigan Department of Environment, Great Lakes and Energy (EGLE) has settled a civil rights complaint regarding a hazardous waste facility in Detroit. The complaint, filed in 2020 by environmental groups and local residents, challenged the renewal and expansion of U.S. Ecology North’s license, arguing it was unjust to increase hazardous waste storage in a predominantly low-income and minority neighborhood. As part of the settlement, EGLE will now consider environmental justice concerns in future licensing decisions for hazardous waste facilities.

The agreement, described as “groundbreaking” by the Sierra Club, introduces several new measures. EGLE will conduct environmental justice and cumulative impact analyses for future licensing decisions, potentially denying licenses that would have an unlawful impact on human health and the environment. The settlement also includes provisions for improved community engagement, such as better translation services, public input processes, and the installation of air monitors around U.S. Ecology North. Additionally, the state will work with local residents to conduct a community health assessment. While some details remain to be clarified, environmental advocates view this as a significant step towards advancing environmental justice in Michigan.

Toxic Release Inventory

The Toxic Release Inventory (TRI) is a crucial resource in understanding and addressing air quality issues in the United States. Established under the Emergency Planning and Community Right-to-Know Act of 1986, the TRI is a publicly accessible database maintained by the Environmental Protection Agency (EPA). It requires certain industrial facilities to report annually on their releases of toxic chemicals into the environment, including air emissions. The TRI provides valuable data on over 770 chemicals and chemical categories, offering insights into the types and quantities of pollutants released into the air by various industries. This information is vital to researchers, policymakers, and community advocates in assessing local air quality, identifying potential health risks, and developing targeted strategies to reduce toxic air emissions. By making this data publicly available, the TRI plays an important role in promoting transparency and supporting environmental justice initiatives focused on improving air quality in communities across the nation.

Envirofacts API

EnviroFacts is a comprehensive online database and information system maintained by the U.S. Environmental Protection Agency (EPA). It serves as a centralized hub for accessing a wide range of environmental data collected by the EPA and other federal agencies. EnviroFacts integrates information from multiple EPA databases, covering various aspects of environmental health, including air quality, water quality, hazardous waste, and toxic releases. One of the key components of EnviroFacts is the Toxic Release Inventory (TRI) data. Through EnviroFacts, users can easily access and query TRI information, allowing them to investigate toxic chemical releases and waste management activities in their local areas. The integration of TRI data within the broader EnviroFacts system enables researchers, policymakers, and community members to contextualize toxic release information alongside other environmental indicators.

Envirofacts is available as a web based search and data platform and also as a programmatic API. The web platform is a great way to familiarize yourself with the available datasets and create simple downloads, however, for analytical purposes we recommend learning to navigate their API so you can create repeatable and reliable analysis.

Querying the API

We’ll continue to use the Detroit metro boundary we created earlier. We assigned them the names detroit_metro and detroit_metro_bm (projected to mercator to match basemaps for plotting).

# Print the bounding box coordinates
print("Bounding Box:")
print(f"Minimum X (Longitude): {bbox[0]}")
print(f"Minimum Y (Latitude): {bbox[1]}")
print(f"Maximum X (Longitude): {bbox[2]}")
print(f"Maximum Y (Latitude): {bbox[3]}")
Bounding Box:
Minimum X (Longitude): -83.689438
Minimum Y (Latitude): 42.02793
Maximum X (Longitude): -82.705966
Maximum Y (Latitude): 42.897541

Now we’ll create and empty list to store the TRI data, loop through each county making an API query, and place the retrieved data in the empty container.

# Fetch TRI facility data from EPA API for each county
tri_data = []

# Loop through each county to make a separate API inquiry (was having trouble attempting all at once)
for county in counties:
    api_url = f"https://data.epa.gov/efservice/tri_facility/state_abbr/MI/county_name/{county}/JSON"
    response = requests.get(api_url)
    # Is our response good
    if response.status_code == 200:
        county_data = response.json()
        # Add the county to the list
        tri_data.extend(county_data)
    else:
        print(f"Failed to fetch data for {county} County. Status code: {response.status_code}")

Processing the Data

The TRI data comes in json format. They can be a little confusing to interpret, becauce each record (traditionally a row in a table) is viewed with columns structured vertically. Let’s look at the first record.

tri_data[1]
{'tri_facility_id': '48007GRWGR3155W',
 'facility_name': 'PPG GROW DETROIT',
 'street_address': '14000 STANSBURY',
 'city_name': 'DETROIT',
 'county_name': 'WAYNE',
 'state_county_fips_code': '26163',
 'state_abbr': 'MI',
 'zip_code': '48227',
 'region': '5',
 'fac_closed_ind': '0',
 'mail_name': 'PPG INDS.',
 'mail_street_address': '1330 PIEDMONT ATTN:  RICH MCCURDY',
 'mail_city': 'TROY',
 'mail_state_abbr': 'MI',
 'mail_province': None,
 'mail_country': None,
 'mail_zip_code': '48083',
 'asgn_federal_ind': 'C',
 'asgn_agency': None,
 'frs_id': None,
 'parent_co_db_num': '001344803',
 'parent_co_name': 'PPG INDS INC',
 'fac_latitude': 422146,
 'fac_longitude': 831255,
 'pref_latitude': 42.3903,
 'pref_longitude': 83.182,
 'pref_accuracy': 150,
 'pref_collect_meth': 'A1',
 'pref_desc_category': 'PG',
 'pref_horizontal_datum': '1',
 'pref_source_scale': 'J',
 'pref_qa_code': '1000',
 'asgn_partial_ind': '0',
 'asgn_public_contact': None,
 'asgn_public_phone': None,
 'asgn_public_contact_email': None,
 'bia_code': None,
 'standardized_parent_company': None,
 'asgn_public_phone_ext': None,
 'epa_registry_id': '110041981807',
 'asgn_technical_contact': None,
 'asgn_technical_phone': None,
 'asgn_technical_phone_ext': None,
 'mail': None,
 'asgn_technical_contact_email': None,
 'foreign_parent_co_name': None,
 'foreign_parent_co_db_num': None,
 'standardized_foreign_parent_company': None}

This record shows all the information for PPG GROW DETROIT with columns and values listed side-by-side (‘city_name’: ‘DETROIT’). This may seem difficult to deal with, but most programming languages have tools to easily parse json data into traditional tables. pd.DataFrame does it automatically when you attempt to create a pandas data frame.

# Convert TRI data to a DataFrame
tri_df = pd.DataFrame(tri_data)

print(f"Number of facilities fetched: {len(tri_df)}")

tri_df.head
Number of facilities fetched: 789
<bound method NDFrame.head of      tri_facility_id                                     facility_name  \
0    2821WNLNDW486MI  INLAND WATERS POLLUTION CONTROL DETROIT FACILITY   
1    48007GRWGR3155W                                  PPG GROW DETROIT   
2    48029WYNGR700LE                  WAYNE GREASE DIV OF DARLING & CO   
3    4809WCDLLC67SUT                        CADILLAC ASPHALT LLC - DIX   
4    48101FRDMT17000                         FORD MOTOR CO PILOT PLANT   
..               ...                                               ...   
784  4831WNNVTV68115                       INNOVATIVE DESIGN SOLUTIONS   
785  4831WSPRRM675SI                             SUPERIOR MATERIALS 39   
786  4831WTHCRW122SH                PPG COATINGS SERVICES SHELBY PLANT   
787  4831WTRNSF726SC                          TRANSFORM AUTOMOTIVE LLC   
788  48397DTRTR6501E           U.S. ARMY DETROIT ARSENAL AMSTA-IRM-XEM   

             street_address         city_name county_name  \
0         4086 MICHIGAN AVE           DETROIT       WAYNE   
1           14000 STANSBURY           DETROIT       WAYNE   
2                 700 LEIGH           DETROIT       WAYNE   
3              670 S DIX ST           DETROIT       WAYNE   
4        17000 OAKWOOD BLVD        ALLEN PARK       WAYNE   
..                      ...               ...         ...   
784         6801 15 MILE RD  STERLING HEIGHTS      MACOMB   
785            6750 SIMS RD  STERLING HEIGHTS      MACOMB   
786    12020 SHELBY TECH DR   SHELBY TOWNSHIP      MACOMB   
787  7026 STERLING PONDS CT  STERLING HEIGHTS      MACOMB   
788       6501 E 11 MILE RD            WARREN      MACOMB   

    state_county_fips_code state_abbr zip_code region fac_closed_ind  ...  \
0                    26163         MI    48210      5              0  ...   
1                    26163         MI    48227      5              0  ...   
2                    26163         MI    48209      5              0  ...   
3                    26163         MI    48217      5              0  ...   
4                    26163         MI    48101      5              0  ...   
..                     ...        ...      ...    ...            ...  ...   
784                  26099         MI    48312      5              0  ...   
785                  26099         MI    48313      5              0  ...   
786                  26099         MI    48315      5              0  ...   
787                  26099         MI    48312      5              0  ...   
788                  26099         MI    48092      5              1  ...   

    asgn_public_phone_ext epa_registry_id asgn_technical_contact  \
0                    None    110063002218                   None   
1                    None    110041981807                   None   
2                    None    110002116889                   None   
3                    None    110067595668                   None   
4                    None    110000852845                   None   
..                    ...             ...                    ...   
784                  None    110064454077                   None   
785                  None    110015884818                   None   
786                  None    110032732041                   None   
787                  None            None                   None   
788                  None    110006393679                   None   

    asgn_technical_phone asgn_technical_phone_ext  mail  \
0                   None                     None  None   
1                   None                     None  None   
2                   None                     None  None   
3                   None                     None  None   
4                   None                     None  None   
..                   ...                      ...   ...   
784                 None                     None  None   
785                 None                     None  None   
786                 None                     None  None   
787                 None                     None  None   
788                 None                     None  None   

    asgn_technical_contact_email foreign_parent_co_name  \
0                           None                     NA   
1                           None                   None   
2                           None                   None   
3                           None      CRH PUBLIC LTD CO   
4                           None                   None   
..                           ...                    ...   
784                         None                     NA   
785                         None                   None   
786                         None                   None   
787                         None                   None   
788                         None                   None   

    foreign_parent_co_db_num standardized_foreign_parent_company  
0                       None                                None  
1                       None                                None  
2                       None                                None  
3                  219509155                                None  
4                       None                                None  
..                       ...                                 ...  
784                     None                                None  
785                     None                                None  
786                     None                                None  
787                     None                                None  
788                     None                                None  

[789 rows x 48 columns]>

Checking the first few rows everything looks as it should. We want to geocode the facility locations into a point layer using the latitude and longitude values. Therefore, we should start by making sure all the facilities have latitude and longitude values.

We’ll check and remove those without.

# Create a copy of the dataframe to avoid SettingWithCopyWarning
tri_df_clean = tri_df.copy()

# Remove facilities with empty latitude or longitude values
tri_df_clean = tri_df_clean.dropna(subset=['pref_latitude', 'pref_longitude'])

print(f"Number of facilities after removing empty coordinates: {len(tri_df_clean)}")

# Convert latitude and longitude to numeric type
tri_df_clean['pref_latitude'] = pd.to_numeric(tri_df_clean['pref_latitude'], errors='coerce')
tri_df_clean['pref_longitude'] = pd.to_numeric(tri_df_clean['pref_longitude'], errors='coerce')
Number of facilities after removing empty coordinates: 478

Ouch! We lost roughly 300 records due to missing coordinates (789 vs. 478). If this were an important analysis for policy-making or scientific research we would take the time to geocode the listed addresses into lat/long coordinates, but for the purposes of this exercise we’ll move on.

In my personal investigations of this data, I noticed that some longitude coordinates were not negative, but had an absolute value (e.g. 83.25) that made sense for the Detroit metro area, therefore we will flip the sign on these records.

# Function to correct longitude by making it negative if positive
def correct_longitude(lon):
    if lon > 0:
        return -lon
    return lon

# Apply longitude correction
tri_df_clean['pref_longitude'] = tri_df_clean['pref_longitude'].apply(correct_longitude)

Additionally, there are some records with wild longitudinal values that are not simply from a missing negative sign. We’ll identify these outliers using the interquantile range. Anything outside the range will be tosses. One of the downsides of the TRI database is that many aspects are provided directly by the facility (and therefore subject to errors). Again, if this were a more important analysis we would carefully explore all of these missing records and possible geocode using the address.

Toss the quartile outliers.

# Calculate IQR for longitude
Q1 = tri_df_clean['pref_longitude'].quantile(0.25)
Q3 = tri_df_clean['pref_longitude'].quantile(0.75)
IQR = Q3 - Q1

# Define bounds for outliers
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

lower_bound, upper_bound

# Remove outliers
tri_df_clean = tri_df_clean[(tri_df_clean['pref_longitude'] >= lower_bound) & 
                            (tri_df_clean['pref_longitude'] <= upper_bound)]

print(f"Number of facilities after removing longitude outliers: {len(tri_df_clean)}")
Number of facilities after removing longitude outliers: 473

Thankfully we only lost 2 more records from outlier.

Visualizing the Data

The data has been cleaned up a bit so now lets create a spatial point object from the data with geopandas and use matplotlib to plot the values on top of a basemap of Detroit we’ll acquire using contextily and our original Wayne, Macomb, and Oakland counties boundary.

# Create a GeoDataFrame from the cleaned TRI data
detroit_tri = gpd.GeoDataFrame(
    tri_df_clean, 
    geometry=gpd.points_from_xy(tri_df_clean.pref_longitude, tri_df_clean.pref_latitude),
    crs="EPSG:4326"
)

# Reproject data to Web Mercator for contextily
# detroit_metro = detroit_metro.to_crs(epsg=3857)
# bbox_polygon = bbox_polygon.to_crs(epsg=3857)
detroit_tri = detroit_tri.to_crs(epsg=3857)

# Create the plot
fig, ax = plt.subplots(figsize=(7.5, 7.5))

# Plot the metro area and bounding box (reusing objects from earlier)
detroit_metro_bm.plot(ax=ax, facecolor='none', edgecolor='blue', linewidth=2)
bbox_polygon_bm.boundary.plot(ax=ax, color='red', linewidth=2)

# Plot TRI facilities (reusing the detroit_tri object from earlier)
detroit_tri.plot(ax=ax, color='purple', edgecolor='grey', markersize=50, alpha=0.5, label='TRI Facilities')
# Plot ICIS-AIR facilities
gdf_icis_air_bm.plot(ax=ax, color='cyan', edgecolor='grey', markersize=50, alpha=0.5, label='ICIS-AIR Facilities')

# Add the basemap
ctx.add_basemap(ax, source=ctx.providers.OpenStreetMap.Mapnik)

# Set the extent of the map to the bounding box
ax.set_xlim(bbox_polygon_bm.total_bounds[0], bbox_polygon_bm.total_bounds[2])
ax.set_ylim(bbox_polygon_bm.total_bounds[1], bbox_polygon_bm.total_bounds[3])

# Remove axes
ax.set_axis_off()

# Add legend
ax.legend()

plt.title("Detroit Metro Area TRI and ICIS-AIR Facilities", fontsize=16)
plt.tight_layout()
plt.show()

print(f"Number of TRI facilities plotted: {len(detroit_tri)}")
print(f"Number of ICIS-AIR facilities plotted: {len(gdf_icis_air)}")

Number of TRI facilities plotted: 473
Number of ICIS-AIR facilities plotted: 903

We can see that there are far fewer facilities being tracked in TRI compared to ICIS-AIR. While some of this can be attributed to the records we tossed after checking for valid geographic coordinates and outliers, but generally speaking ICIS-AIR will contain far more as it tracks every facility regulated by the Clean Air Act as opposed to only those that are part of TRI.

You may have also noticed that this dataset does not contain any actually pollution data; only facility information for those that are regulated by TRI.

TRI is made up of numerous tables containing a wealth of information regarding the facilities themselves and all the regulated chemicals that are handled at each site. The TRI_FACILITY table we requested is the starter table used in most of the examples in the Envirofacts API documentation, however, it does not contain any chemical release data.

Navigating the TRI Envirofacts interface can be overwhelming at times. Multiple tables can be queried with a single URL, however, for a proper join to be carried out both tables must contain at least 1 column. In most cases this can be accomplished using TRI_FACILITY_ID. That said, air pollution release data (column name AIR_TOTAL_RELEASE) is only found in the TRI_FORM_R table, which does not contain the FACILITY_ID column.

You can querry the V_TRI_FORM_R_EZ using the API even though it’s not a documented table listed in the API website. This form contains facility information and AIR_TOTAL_RELEASE reporting, but it only goes until 2021. To get the most recent data (2023) you would have to manually download the individual tables with separate calls perform the joins, or use the custom form search web portal. We’ll demonstrate a quick API call for the V_TRI_FORM_R_EZ, but move forward with the most recent data available in the custom form search web portal.

Toxic Release Inventory: Form R

We can make the query to Envirofacts. Let’s breakdown the call.

# URL of the CSV file
url = "https://data.epa.gov/efservice/V_TRI_FORM_R_EZ/REPORTING_YEAR/2021/AIR_TOTAL_RELEASE/>/0/STATE_ABBR/MI/COUNTY_NAME/CONTAINING/WAYNE/CONTAINING/OAKLAND/CONTAINING/MACOMB/CSV"
  • Base url: https://data.epa.gov/efservice/
  • The table we’re requesting: V_TRI_FORM_R_EZ/
  • For just 2021: REPORTING_YEAR/2021/
  • Only facilities that report some air release: AIR_TOTAL_RELEASE/>/0/
  • Just Michigan: STATE_ABBR/MI/
  • Our counties: COUNTY_NAME/CONTAINING/WAYNE/CONTAINING/OAKLAND/CONTAINING/MACOMB/
  • We want it as a csv: CSV
# Read the CSV file directly with pandas
tri_form_r = pd.read_csv(url)
print(f"Successfully read CSV. Number of records: {len(tri_form_r)}")

# Display information about the dataset
print("\nColumns in the dataset:")
print(tri_form_r.columns)
Successfully read CSV. Number of records: 413

Columns in the dataset:
Index(['tri_facility_id', 'facility_name', 'street_address', 'city_name',
       'county_name', 'state_county_fips_code', 'state_abbr', 'zip_code',
       'region', 'fac_closed_ind',
       ...
       'additional_text_9_1', 'srs_id', 'media_type', 'prod_ratio_or_activity',
       'industry_code', 'industry_description', 'source', 'method', 'adjusted',
       'covered_naics'],
      dtype='object', length=568)

We have roughly the same number of facilities as before, but this table has a lot of columns (568). We should make it more manageable.

keeps = [
    'facility_name', 'street_address', 'city_name',
    'reporting_year', 'air_total_release']

tri_form_r = tri_form_r[keeps]

This works well but we can get more recent data that is already filtered using the web portal. The user selects location, years of interest, and columns of interests. The portal serves up your requested dataset in an Amazon S3 cloud storage. We can important the dataset directly from the address they provide you.

tri_form_r = pd.read_csv("https://dmap-epa-enviro-prod-export.s3.amazonaws.com/396975438.CSV")

# Display information about the dataset
print(f"Successfully read CSV. Number of records: {len(tri_form_r)}")
print("\nColumns in the dataset:")
print(tri_form_r.columns)
Successfully read CSV. Number of records: 994

Columns in the dataset:
Index(['AIR_TOTAL_RELEASE', 'CHEM_NAME', 'FACILITY_NAME', 'LATITUDE',
       'LONGITUDE', 'STREET_ADDRESS', 'TRI_FACILITY_ID'],
      dtype='object')

This is a bit more manageable, but we have multiple records per facility because each chemical release has a separate record. This allows the user to investigate specific chemicals, but for the purposes of this exercise we will aggregate AIR_TOTAL_RELEASE for each facility.

# Check how it looks
tri_form_r.head
<bound method NDFrame.head of      AIR_TOTAL_RELEASE                       CHEM_NAME  \
0            1660.0000                 n-Butyl alcohol   
1               0.0000                            Lead   
2            3306.0000          Methyl isobutyl ketone   
3               0.8990               Cadmium compounds   
4               1.0285                  Lead compounds   
..                 ...                             ...   
989           523.0000          Methyl isobutyl ketone   
990           282.6000                  Lead compounds   
991             0.0000  Aluminum oxide (fibrous forms)   
992          2715.0000           Certain glycol ethers   
993             0.0000                  Sodium nitrite   

                                         FACILITY_NAME   LATITUDE  LONGITUDE  \
0             GENERAL MOTORS LLC ORION ASSEMBLY CENTER  42.715800 -83.260700   
1                              WIELAND NATIONAL BRONZE  42.502610 -82.967980   
2    AXALTA COATING SYSTEMS USA LLC-MOUNT CLEMENS P...  42.614330 -82.888360   
3                                   WAYNE DISPOSAL INC  42.219170 -83.522691   
4                                   WAYNE DISPOSAL INC  42.219170 -83.522691   
..                                                 ...        ...        ...   
989                 FORD MOTOR CO DEARBORN TRUCK PLANT  42.302500 -83.154167   
990         CLEVELAND-CLIFFS STEEL CORP DEARBORN WORKS  42.301742 -83.162934   
991                                     EQ DETROIT INC  42.365910 -83.047740   
992                           FLAT ROCK ASSEMBLY PLANT  42.102470 -83.243560   
993                    US STEEL CORP GREAT LAKES WORKS  42.277400 -83.110300   

              STREET_ADDRESS  TRI_FACILITY_ID  
0           4555 GIDDINGS RD  48055GNRLM4555G  
1             28070 HAYES RD  4806WWLNDN287HA  
2          400 GROESBECK HWY  48043DPNTM400GR  
3    49350 N I-94 SERVICE DR  48111WYNDS49350  
4    49350 N I-94 SERVICE DR  48111WYNDS49350  
..                       ...              ...  
989           3001 MILLER RD  48121FRDM23001M  
990           4001 MILLER RD  48121RGSTL3001M  
991        1923 FREDERICK ST  48211SLCTY1923F  
992       1 INTERNATIONAL DR  48134MZDMT1MAZD  
993             1 QUALITY DR  48229GRTLKNO1QU  

[994 rows x 7 columns]>

Let’s aggregate it.

# Sum across individual facilities
tri_form_r = tri_form_r.groupby(['FACILITY_NAME', 'LATITUDE','LONGITUDE','STREET_ADDRESS'], as_index=False).agg({
    'AIR_TOTAL_RELEASE': 'sum'
})
# How many records are there now
print(f"Successfully read CSV. Number of records: {len(tri_form_r)}")
Successfully read CSV. Number of records: 211

There were only 211 unique listings in the 2023 data.

As before, we should check for valid coordinates and release data.

# Assuming the latitude and longitude columns are named 'LATITUDE' and 'LONGITUDE'
# Adjust these names if they're different in your CSV
lat_col = 'LATITUDE'
lon_col = 'LONGITUDE'
release_col = 'AIR_TOTAL_RELEASE'  # Adjust this to the actual column name for air releases

# Remove records with missing coordinates or air release data
df_tri_clean = tri_form_r.dropna(subset=[lat_col, lon_col, release_col])

print(f"\nNumber of records after removing missing data: {len(df_tri_clean)}")

Number of records after removing missing data: 211

There were no missing coordinates are release data.

Now we can create a spatial object with GeoPandas using the latitude and longitude coordinates in the table.

# Create a GeoDataFrame
gdf_tri_form_r = gpd.GeoDataFrame(
    df_tri_clean, 
    geometry=gpd.points_from_xy(df_tri_clean[lon_col], df_tri_clean[lat_col]),
    crs="EPSG:4326"
)

Let’s check the distribution of the air release data. I suspect there will be outliers.

# Create the histogram
plt.figure(figsize=(8, 8))
plt.hist(gdf_tri_form_r['AIR_TOTAL_RELEASE'], bins=10, edgecolor='black')
plt.title('Histogram of Air Total Release Sum')
plt.xlabel('Air Sum')
plt.ylabel('Frequency')
plt.grid(True, alpha=0.3)

# Show the plot
plt.show()

Looks like there are outliers around 100,000 lbs and 400,000 lbs. Before we map the data, it would be a good idea to perform a log transformation of AIR_TOTAL_RELEASE to smooth out the visuals. You can’t take the log of negative or zero values, so we’ll use the log1p function, which adds 1 to every value before taking the log.

# Assuming 'result' is your DataFrame from earlier
gdf_tri_form_r['LOG_AIR_RELEASE'] = np.log1p(gdf_tri_form_r['AIR_TOTAL_RELEASE'])

# Create the histogram
plt.figure(figsize=(8, 8))
plt.hist(gdf_tri_form_r['LOG_AIR_RELEASE'], bins=10, edgecolor='black')
plt.title('Histogram of the Log of Air Total Release Sum')
plt.xlabel('Log of Air Sum')
plt.ylabel('Frequency')
plt.grid(True, alpha=0.3)

# Show the plot
plt.show()

That looks much better.

Now we can visualize the data, by overlaying the points on a basemap of Detroit while using graduated symbols to illustrate the amount of air releases for a given facility.

# Reproject to Web Mercator to match the basemap
gdf_tri_form_r_bm = gdf_tri_form_r.to_crs(epsg=3857)

# Create the plot
fig, ax = plt.subplots(figsize=(7.5, 7.5))

# Plot the metro area and bounding box (reusing objects from earlier)
detroit_metro_bm.plot(ax=ax, facecolor='none', edgecolor='blue', linewidth=2)
bbox_polygon_bm.boundary.plot(ax=ax, color='orangered', linewidth=2)

# Plot TRI facilities with graduated symbols based on air releases
scatter = ax.scatter(gdf_tri_form_r_bm.geometry.x, gdf_tri_form_r_bm.geometry.y, 
                     s=gdf_tri_form_r_bm['LOG_AIR_RELEASE']*20,  # Adjust the scaling factor as needed
                     c='orangered',  # Static fill color
                     edgecolor='yellow',  # Outline color
                     linewidth=1,  # Adjust the outline width as needed
                     alpha=0.7)

# Add the basemap
ctx.add_basemap(ax, source=ctx.providers.OpenStreetMap.Mapnik)

# Set the extent of the map to the bounding box
ax.set_xlim(bbox_polygon_bm.total_bounds[0], bbox_polygon_bm.total_bounds[2])
ax.set_ylim(bbox_polygon_bm.total_bounds[1], bbox_polygon_bm.total_bounds[3])

# Remove axes
ax.set_axis_off()

# Add a legend for symbol sizes
legend_sizes = [0,4,8,12]  # Example sizes, adjust based on your data
legend_elements = [plt.scatter([], [], s=size*20, c='orangered', edgecolor='yellow', 
                               linewidth=1, alpha=1, label=f'{size:,}') 
                   for size in legend_sizes]
ax.legend(handles=legend_elements, title='Log Total Air Releases (lbs)', 
          loc='lower right', title_fontsize=12, fontsize=10)

plt.title("Detroit Metro Area TRI Facilities - Total Air Releases (Custom Data)", fontsize=16)
plt.tight_layout()
plt.show()

print(f"\nNumber of TRI facilities plotted: {len(gdf_tri_form_r)}")
print(f"Total air releases: {gdf_tri_form_r[release_col].sum():,.2f} lbs")
print(f"Average air release per facility: {gdf_tri_form_r[release_col].mean():,.2f} lbs")


Number of TRI facilities plotted: 211
Total air releases: 2,117,167.80 lbs
Average air release per facility: 10,033.97 lbs

The point source air release data provides a good snapshot of the locations and relative amounts of chemical releases. With some basic understanding of the demographic dynamics of Detroit one might theorize who is being subjected to high levels of pollution, however, we’ll need additional data to develop any type of systematic analysis.

Social Vulnerability Index

NASA’s Social Vulnerability Index (SVI) raster dataset, available through the Socioeconomic Data and Applications Center (SEDAC), is a high-resolution geospatial resource that quantifies social vulnerability across the United States. This dataset is based on the CDC’s Social Vulnerability Index but is presented in a raster format, providing continuous coverage at a 250-meter resolution. The SVI incorporates various socioeconomic and demographic factors such as poverty, lack of vehicle access, crowded housing, and minority status to assess communities’ capacity to prepare for, respond to, and recover from hazards, including environmental threats like poor air quality.

The raster format allows for more detailed spatial analysis and integration with other environmental datasets. This makes it particularly valuable for researchers and policymakers studying the intersection of social vulnerability and environmental risks, such as air pollution exposure. By overlaying this SVI data with air quality information, for instance, analysts can identify areas where socially vulnerable populations may be disproportionately affected by poor air quality, supporting environmental justice initiatives and targeted intervention strategies.

SEDAC, as part of NASA’s Earth Observing System Data and Information System (EOSDIS), hosts this dataset along with other socioeconomic and environmental data, facilitating interdisciplinary research on human-environment interactions. The SVI raster dataset’s high resolution and comprehensive coverage make it a powerful tool for assessing environmental equity and informing policy decisions at various geographic scales.

Although the SVI dataset is free to acquire, it does require an Earth Data account. If you don’t already have an Earth Data account you can follow these steps to download the SVI dataset on your local computer:

  1. Visit the NASA Earthdata website: Go to https://urs.earthdata.nasa.gov/ Click on “Register”: Look for the “Register” button on the top right corner of the page.
  2. Fill out the registration form: Provide the required information, including your name, email address, and a password. You’ll also need to create a username.
  3. Verify your email: NASA will send a verification email to the address you provided. Click on the link in this email to confirm your account.
  4. Log in to Earth Data: Once your account is verified, you can log in using your username and password.
  5. Access SEDAC: Visit the SEDAC website (https://sedac.ciesin.columbia.edu/) and use your Earth Data credentials to log in when prompted.
  6. Download the data: Once you’ve found the SVI dataset, you can use your Earth Data account to download it.

Remember, your Earth Data account gives you access not just to SEDAC, but to a wide range of NASA Earth science data. It’s a valuable resource for researchers, students, and anyone interested in environmental and socioeconomic data.

Data Processing

Once you’ve downloaded the dataset to your working directory, you can proceed with the analysis. We’ll be using the 2020 census tract version of SVI.

The different layers of SVI are provided as individual files, but sometimes it’s easier to work with a multilayer object. We can create one using xarray. To begin, we’ll read in each file individually, clip it to our border of Detroit metro, and create an individual data array.

# Specify the TIF files
tif_files = [
    "data/svi/svi_2020_tract_overall_wgs84.tif",
    "data/svi/svi_2020_tract_minority_wgs84.tif",
    "data/svi/svi_2020_tract_socioeconomic_wgs84.tif",
    "data/svi/svi_2020_tract_housing_wgs84.tif",
    "data/svi/svi_2020_tract_household_wgs84.tif"
]

# Create an empty list to store the individual DataArrays
data_arrays = []

# Read each TIF file, clip it to Detroit metro's extent, and append it to the list
for file in tif_files:
    with rasterio.open(file) as src:
        # Reproject Detroit metro boundary to match the raster CRS
        metro_reprojected = detroit_metro.to_crs(src.crs)
        
        # Clip the raster to Detroit metro's geometry
        out_image, out_transform = rasterio.mask.mask(src, metro_reprojected.geometry, crop=True)
        out_meta = src.meta.copy()
        
        # Update the metadata
        out_meta.update({"driver": "GTiff",
                         "height": out_image.shape[1],
                         "width": out_image.shape[2],
                         "transform": out_transform})
        
        # Create coordinates
        height = out_meta['height']
        width = out_meta['width']
        cols, rows = np.meshgrid(np.arange(width), np.arange(height))
        xs, ys = rasterio.transform.xy(out_transform, rows, cols)
        
        # Convert lists to numpy arrays
        xs = np.array(xs)
        ys = np.array(ys)
        
        # Reshape coordinates to match dimensions of the raster
        xs = xs.reshape(height, width)
        ys = ys.reshape(height, width)
        
        # Create a DataArray from the clipped data
        da = xr.DataArray(out_image[0],  # Use the first band
                          coords={'y': ('y', ys[:, 0]),
                                  'x': ('x', xs[0, :])},
                          dims=['y', 'x'])
        da.attrs['crs'] = str(src.crs)  # Convert CRS to string
        da.attrs['transform'] = out_transform
        data_arrays.append(da)

Now we can combine them together and give the layers “pretty” names.

# Combine all DataArrays into a single DataSet
svi_detroit = xr.concat(data_arrays, dim='layer')

# Rename the layers
layer_names = ['Overall', 'Minority', 'Socioeconomic', 'Housing', 'Household']
svi_detroit = svi_detroit.assign_coords(layer=('layer', layer_names))
svi_detroit
<xarray.DataArray (layer: 5, y: 105, x: 119)>
array([[[-3.4e+38, -3.4e+38, -3.4e+38, ..., -3.4e+38, -3.4e+38,
         -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38, ..., -3.4e+38, -3.4e+38,
         -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38, ..., -3.4e+38, -3.4e+38,
         -3.4e+38],
        ...,
        [-3.4e+38, -3.4e+38, -3.4e+38, ..., -3.4e+38, -3.4e+38,
         -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38, ..., -3.4e+38, -3.4e+38,
         -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38, ..., -3.4e+38, -3.4e+38,
         -3.4e+38]],

       [[-3.4e+38, -3.4e+38, -3.4e+38, ..., -3.4e+38, -3.4e+38,
         -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38, ..., -3.4e+38, -3.4e+38,
         -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38, ..., -3.4e+38, -3.4e+38,
         -3.4e+38],
...
        [-3.4e+38, -3.4e+38, -3.4e+38, ..., -3.4e+38, -3.4e+38,
         -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38, ..., -3.4e+38, -3.4e+38,
         -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38, ..., -3.4e+38, -3.4e+38,
         -3.4e+38]],

       [[-3.4e+38, -3.4e+38, -3.4e+38, ..., -3.4e+38, -3.4e+38,
         -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38, ..., -3.4e+38, -3.4e+38,
         -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38, ..., -3.4e+38, -3.4e+38,
         -3.4e+38],
        ...,
        [-3.4e+38, -3.4e+38, -3.4e+38, ..., -3.4e+38, -3.4e+38,
         -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38, ..., -3.4e+38, -3.4e+38,
         -3.4e+38],
        [-3.4e+38, -3.4e+38, -3.4e+38, ..., -3.4e+38, -3.4e+38,
         -3.4e+38]]], dtype=float32)
Coordinates:
  * y        (y) float64 42.9 42.89 42.88 42.87 ... 42.05 42.05 42.04 42.03
  * x        (x) float64 -83.69 -83.68 -83.67 -83.66 ... -82.72 -82.71 -82.7
  * layer    (layer) <U13 'Overall' 'Minority' ... 'Housing' 'Household'
Attributes:
    crs:        EPSG:4326
    transform:  | 0.01, 0.00,-83.69|\n| 0.00,-0.01, 42.90|\n| 0.00, 0.00, 1.00|

Now we can plot each layer.

# Define the colorbar limits (SVI is a 0-1 scale)
vmin, vmax = 0, 1

# Create a multipanel plot
fig, axes = plt.subplots(3, 2, figsize=(8, 10))
axes = axes.flatten()

# Plot each layer
for i, layer in enumerate(svi_detroit.layer.values):
    # Plot with custom color limits
    im = svi_detroit.sel(layer=layer).plot(ax=axes[i], add_colorbar=False, vmin=vmin, vmax=vmax, cmap='plasma')
    axes[i].set_title(layer)
    # Remove axes and labels
    axes[i].axis('off')
    
    # Plot Detroit metro boundary
    metro_reprojected.boundary.plot(ax=axes[i], color='red', linewidth=1)

# Remove the extra subplot
fig.delaxes(axes[5])

# Add a single colorbar
cbar_ax = fig.add_axes([0.92, 0.15, 0.02, 0.7])
cbar = fig.colorbar(im, cax=cbar_ax, label='SVI Score', 
                    fraction=0.047, pad=0.04, aspect=20)

plt.tight_layout()
plt.show()

This provides an excellent look at demographic trends in the Detroit metro area. Overall vulnerability is highest in the inner city. The most striking drivers are the concentration of minorities and socioeconomic vulnerability in the downtown area. The housing and household components are slightly more varied throughout the region.

Integrating TRI and SVI

There are numerous ways to assess the reationships between the SVI and TRI data. For this lesson we’ll identify areas where air releases and vulnerability are highest together by creating a new rasterized index that combines both layers.

To begin we need to rasterize the air release point data. We will create an empty raster grid and “fill” the cells with the sum of the air release values from any points the are overlapped by the cells.

Start by getting the bounds from our boundary object, specify the desired resolution, and set up the transform.

# Get the bounds of the Detroit metro area
minx, miny, maxx, maxy = detroit_metro_bm.total_bounds

# Define the resolution (1km/1000m) to match SVI
resolution = 5000

# Calculate the number of cells
nx = int((maxx - minx) / resolution)
ny = int((maxy - miny) / resolution)

# Create the transform for the raster
transform = from_origin(minx, maxy, resolution, resolution)

Now we have to prepare the point values and rasterize them. A key thing to note here is the merge_alg=rasterio.enums.MergeAlg.add argument passed to the rasterize function. This makes sure that cells with more than 1 point will add the values together (the default is to simply replace).

from rasterio import features

# Prepare geometries and values for rasterization
shapes = ((geom, value) for geom, value in zip(gdf_tri_form_r_bm.geometry, gdf_tri_form_r_bm.AIR_TOTAL_RELEASE))

# Rasterize the point data
air_release_raster = features.rasterize(shapes=shapes, 
                            out_shape=(ny, nx), 
                            transform=transform, 
                            fill=0, 
                            all_touched=True, 
                            merge_alg=rasterio.enums.MergeAlg.add)

We rasterized with rasterio, but we need to convert this to an xarray object to continue.

# Convert the raster to an xarray DataArray
# Note: We use ny and nx here to ensure the coordinates match the raster shape
air_release_raster_da = xr.DataArray(air_release_raster, 
                         coords={'y': np.linspace(maxy, miny, ny),
                                 'x': np.linspace(minx, maxx, nx)},
                         dims=['y', 'x'])
air_release_raster_da.rio.write_crs(detroit_metro_bm.crs, inplace=True)
<xarray.DataArray (y: 26, x: 21)>
array([[0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        4.9699998e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00],
       [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00, 5.1500000e+02, 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00],
       [0.0000000e+00, 0.0000000e+00, 4.9599999e-01, 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 4.0000000e+01,
        0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00],
       [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 3.5800001e-01,
        0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
...
        0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00],
       [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00, 6.8400000e+02, 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00],
       [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00, 1.9386000e+04, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00],
       [0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
        0.0000000e+00]], dtype=float32)
Coordinates:
  * y            (y) float64 5.296e+06 5.291e+06 ... 5.17e+06 5.165e+06
  * x            (x) float64 -9.316e+06 -9.311e+06 ... -9.212e+06 -9.207e+06
    spatial_ref  int32 0

The spaces between county borders can create odd values because there are no points right next to each other. We can clip the raster to our Detroit boundary to fix this.

# Clip the raster with the Detroit metro boundary
air_release_raster_da = air_release_raster_da.rio.clip(
    detroit_metro_bm.geometry.values, 
    detroit_metro_bm.crs, 
    drop=False, all_touched=True)

Now we can see how our rasterized air release values look.

# Define the breaks for the discrete scale
breaks = [0, 1, 10, 100, 1000, 10000, 100000, 250000, 500000]

# Create a custom colormap
colors = ['#fffff3', '#FFFFCC', '#FFEDA0', '#FED976', '#FEB24C', '#FD8D3C', '#FC4E2A', '#E31A1C', '#B10026']
cmap = ListedColormap(colors)

# Create a normalization based on the breaks
norm = BoundaryNorm(breaks, cmap.N)

# Create the plot
fig, ax = plt.subplots(figsize=(7.5, 7.5))

# Plot the TRI facility points
gdf_tri_form_r_bm.plot(ax=ax, color='blue', markersize=10, alpha=0.7)

# Plot the clipped raster with the custom colormap and norm
im = ax.imshow(air_release_raster_da, extent=[minx, maxx, miny, maxy], origin='upper', 
               cmap=cmap, norm=norm)

# Add colorbar with discrete labels
cbar = plt.colorbar(im, ax=ax, extend='max', 
                    label='Total Air Releases (pounds)', 
                    ticks=breaks, 
                    fraction=0.047, pad=0.04, aspect=20)
cbar.ax.set_yticklabels([f'{b:,}' for b in breaks])

# Plot the Detroit metro boundary
detroit_metro_bm.boundary.plot(ax=ax, color='black', linewidth=2)

# Set the extent to match the Detroit metro area
ax.set_xlim(minx, maxx)
ax.set_ylim(miny, maxy)

# Add title and labels
ax.set_title('TRI Air Total Release (100m resolution sum) with Facility Locations', fontsize=16)
ax.set_xlabel('X Coordinate')
ax.set_ylabel('Y Coordinate')

plt.tight_layout()
plt.show()

print(f"Number of TRI facilities plotted: {len(gdf_tri_form_r_bm)}")
print(f"Total air releases: {gdf_tri_form_r_bm['AIR_TOTAL_RELEASE'].sum():,.2f}")
print(f"Maximum cell value in raster: {air_release_raster_da.max().values:,.2f}")

Number of TRI facilities plotted: 211
Total air releases: 2,117,167.80
Maximum cell value in raster: 434,344.84

The raster looks as expected with the highest values in locations where our previous map had the largest circles. We overlayed the points to see the sources of the values. The cell resolution is set to 5000m. This was selected so we can evaluate the map and the process we used to create the raster, but any value could be set depending on the intended use.

Correlation Structure

High concentrations of air releases in our map do appear to coincide with areas of high social vulnerability, but it might not be as systematic or widespread as it appears looking at the 2 maps.

We can extract the values from the

# Calculate raster correlation between SVI overall and air release within Detroit metro boundary
svi_flat = svi_reprojected.values.flatten()
air_flat = air_release_disaggregated.values.flatten()

# Remove NaN values
mask = ~np.isnan(svi_flat) & ~np.isnan(air_flat)
svi_flat = svi_flat[mask]
air_flat = air_flat[mask]

# Filter air data to include only releases greater than 50K
mask = (air_flat >= 1000)
svi_filtered = svi_flat[mask]
air_filtered = air_flat[mask]

# Apply log1p transformation to air release data
air_filtered_log = np.log1p(air_filtered)

# Filter data to include only SVI scores between 0 and 1
mask = (svi_filtered >= 0) & (svi_filtered <= 1)
svi_filtered = svi_filtered[mask]
air_filtered_log = air_filtered_log[mask]

# Create the scatter plot
plt.figure(figsize=(8, 8))
plt.scatter(svi_filtered, air_filtered_log, alpha=0.75, s=10)

# Set labels and title
plt.xlabel('Social Vulnerability Index (SVI)')
plt.ylabel('Log(Air Release + 1)')
plt.title('SVI vs Log(Air Release + 1) Scatter Plot')

# Add a grid for better readability
plt.grid(True, linestyle='--', alpha=0.7)

# Set x-axis limits explicitly to 0-1
plt.xlim(0, 1)

# Set y-axis limits to min and max of transformed air release data
plt.ylim(np.min(air_filtered_log), np.max(air_filtered_log))

# Adjust layout and display the plot
plt.tight_layout()
plt.show()

Run a Pearson’s correlation.

# Perform correlation analysis
svi_air_correlation, p_value_points = stats.pearsonr(svi_filtered, air_filtered_log)

print(f"Raster Correlation between SVI and Air Rleases: {svi_air_correlation:.4f}")
print(f"P-value: {p_value_points:.4f}")

Air Release Vulnerability Index

Now that we’ve created our air release raster layer we can combine it with the SVI raster to create a new index identifying areas with the highest combination of air releases and vulnerability.

Start by selecting just the SVI Overall layer and converting it to a rioxarray object so we can perform raster calculations. We’ll also ensure the 2 raster layers “line up” by reprojecting into the same coordinate reference system and matching their resolutions.

# Select the 'Overall' layer
svi_overall = svi_detroit.sel(layer='Overall')

# Convert to rioxarray for geospatial operations
svi_overall = svi_overall.rio.write_crs("EPSG:4326")

# Reproject SVI to match the CRS of the air release raster
svi_reprojected = svi_overall.rio.reproject_match(air_release_raster_da)

# Disaggregate the air release data to match the resolution of the SVI data
air_release_disaggregated = svi_reprojected.rio.reproject_match(
    svi_reprojected,
    resampling=Resampling.bilinear
)

There are a few more steps that will aid in interpretation.

  1. We will take the log of the air release data to reduce the impact of major outliers.
  2. We will scale the logged air release data to 0-1 to match the SVI data. Now the resulting index we create will have equal contributions from both datasets.
# Log1p transform the air release data and scale to 0-1
air_release_log = np.log1p(air_release_disaggregated)
air_release_scaled = (air_release_log - air_release_log.min()) / (air_release_log.max() - air_release_log.min())

# Multiply scaled air release data with SVI data
vulnerability_indicator = air_release_scaled * svi_reprojected

Now let’s visualize our index along with the inputs.

# Create the plots
fig, axs = plt.subplots(3, 1, figsize=(8, 20))

# Plot SVI Overall
im1 = svi_reprojected.plot(ax=axs[0], cmap='viridis', vmin=0, vmax=1, add_colorbar=False)
plt.colorbar(im1, ax=axs[0], label='SVI Overall', 
                    fraction=0.047, pad=0.04, aspect=20)
axs[0].set_title('Social Vulnerability Index (Overall)', fontsize=16)
detroit_metro.boundary.plot(ax=axs[0], color='black', linewidth=2)

# Plot Original Air Release (log-transformed for better visualization)
im2 = np.log1p(air_release_disaggregated).plot(ax=axs[1], cmap='YlOrRd', add_colorbar=False)
plt.colorbar(im2, ax=axs[1], label='Log(Air Release + 1)', 
                    fraction=0.047, pad=0.04, aspect=20)
axs[1].set_title('Air Release (Log-transformed)', fontsize=16)
detroit_metro.boundary.plot(ax=axs[1], color='black', linewidth=2)

# Plot Air Release Vulnerability Indicator
im3 = vulnerability_indicator.plot(ax=axs[2], cmap='YlOrRd', vmin=0, vmax=1, add_colorbar=False)
plt.colorbar(im3, ax=axs[2], label='Air Release Vulnerability Indicator', 
                    fraction=0.047, pad=0.04, aspect=20)
axs[2].set_title('Air Release Vulnerability Indicator\n(Scaled Air Release * SVI)', fontsize=16)
detroit_metro.boundary.plot(ax=axs[2], color='black', linewidth=2)

for ax in axs:
    ax.set_xlabel('Longitude')
    ax.set_ylabel('Latitude')
    ax.set_xlim(svi_reprojected.x.min(), svi_reprojected.x.max())
    ax.set_ylim(svi_reprojected.y.min(), svi_reprojected.y.max())

plt.tight_layout()
plt.show()

# Print some statistics
print(f"Maximum vulnerability indicator: {vulnerability_indicator.max().values:.4f}")

Maximum vulnerability indicator: 0.9773

As one would expect, there are several areas in the downtown river corridor with very high combinations of SVI and TRI releases. The maximum score for our index is 0.9286! That is an extremely vulnerable community facing extraordinary levels of air pollution.

We can get a better sense of where these communities are by extracting the location of the cells with the highest scores and placing them on a basemap of Detroit.

We’ll convert the raster into a data frame, sort the values in descending order, and then extract the coordinates.

# Convert the vulnerability indicator to a pandas DataFrame
vulnerability_df = vulnerability_indicator.to_dataframe(name='index').reset_index()

# Sort by index value and get the top 10
top_10 = vulnerability_df.sort_values('index', ascending=False).head(10)

# Create points from the coordinates
top_10['geometry'] = gpd.points_from_xy(top_10.x, top_10.y)
top_10_gdf = gpd.GeoDataFrame(top_10, geometry='geometry', crs=vulnerability_indicator.rio.crs)

Now let’s make a map.

# Create the final map
fig, ax = plt.subplots(figsize=(8, 8))

# Plot the Detroit metro boundary
detroit_metro_bm.boundary.plot(ax=ax, color='black', linewidth=2)

# Plot the top 10 points
top_10_gdf.plot(ax=ax, color='blue', markersize=100, alpha=0.7)

# Add labels to the points
for idx, row in top_10_gdf.iterrows():
    ax.annotate(f"#{idx+1}", (row.geometry.x, row.geometry.y), 
                xytext=(3, 3), textcoords="offset points", 
                color='black', fontweight='bold')

# Add a basemap
ctx.add_basemap(ax, crs=vulnerability_indicator.rio.crs.to_string(), source=ctx.providers.OpenStreetMap.Mapnik)

# Set the extent to match the Detroit metro area
ax.set_xlim(vulnerability_indicator.x.min(), vulnerability_indicator.x.max())
ax.set_ylim(vulnerability_indicator.y.min(), vulnerability_indicator.y.max())

ax.set_title('Top 10 Areas with Highest Air Release Vulnerability Index', fontsize=16)
ax.set_axis_off()

plt.tight_layout()
plt.show()

# Print the row index and index value of the top 10 points
print("Row index and index value of the top 10 points:")
for i, (idx, row) in enumerate(top_10_gdf.iterrows(), 1):
    print(f"Row index = {idx}, Index value = {round(row['index'], 2)}")

Row index and index value of the top 10 points:
Row index = 347, Index value = 0.98
Row index = 134, Index value = 0.97
Row index = 155, Index value = 0.97
Row index = 306, Index value = 0.96
Row index = 369, Index value = 0.96
Row index = 305, Index value = 0.94
Row index = 285, Index value = 0.94
Row index = 330, Index value = 0.93
Row index = 348, Index value = 0.92
Row index = 324, Index value = 0.9

As you would expect looking at the input layers, the most vulnerable and polluted areas are along the river coridor in Loncoln Park, Melvindale, Mexicantown, and River Rouge. There are multiple locations with indexes greater than 0.70. Information like this can be valuable for targeted relief or mitigation programs.

CDC PLACES

The CDC PLACES (Population Level Analysis and Community Estimates) dataset is a collaboration between the Centers for Disease Control and Prevention (CDC), the Robert Wood Johnson Foundation, and the CDC Foundation. It provides model-based population-level analysis and community estimates of health indicators for all counties, places (incorporated and census designated places), census tracts, and ZIP Code Tabulation Areas (ZCTAs) across the United States. Some key points to consider when working with CDC PLACES:

  1. Spatial Extent: Entire United States, including all 50 states, the District of Columbia, and Puerto Rico.
  2. Spatial Resolution: Multiple levels including counties, cities/towns, census tracts, and ZIP codes.
  3. Indicators: Wide range of chronic disease measures related to health outcomes, prevention, and health risk behaviors.
  4. Data Sources:
    • Behavioral Risk Factor Surveillance System (BRFSS)
    • U.S. Census Bureau’s American Community Survey (ACS)
  5. Methodology: Uses small area estimation methods for small geographic areas.
  6. Health Measures Include:
    • Chronic diseases: e.g., asthma, COPD, heart disease, diabetes
    • Health risk behaviors: e.g., smoking, physical inactivity, binge drinking
    • Prevention practices: e.g., health insurance coverage, dental visits, cholesterol screening
  7. Socioeconomic Data: Includes some socioeconomic and demographic variables.
  8. Annual Updates: Providing recent estimates for local areas.

This dataset is particularly valuable for public health researchers, policymakers, and community organizations. It provides a standardized way to compare health indicators across different geographic areas and can be used to inform targeted interventions and policy decisions, especially in addressing health disparities at a local level.

As with ICIS-AIR and TRI, PLACES is available through a web search interface and a programmatic API. We will focus on the API in this example.

Processing

# Define the GeoJSON API endpoint
url = "https://data.cdc.gov/resource/cwsq-ngmh.geojson"

# Define the Detroit metro area counties
detroit_counties = ['Wayne', 'Oakland', 'Macomb']

# Create the county filter string
county_filter = " OR ".join([f"countyname = '{county}'" for county in detroit_counties])

# Define the query parameters
params = {
    "$where": f"stateabbr = 'MI' AND ({county_filter})",
    "$limit": 50000  # Adjust if necessary
}

# Make the API request
response = requests.get(url, params=params)

if response.status_code == 200:
    data = response.json()
    print(f"Successfully retrieved data")
else:
    print(f"Failed to retrieve data. Status code: {response.status_code}")
    print(response.text)

# Convert to GeoDataFrame
gdf = gpd.read_file(response.text)

# Print available health measures
print("\nAvailable health measures:")
print(gdf['measure'].unique())

# Print the first few rows to see the structure of the data
print("\nFirst few rows of the GeoDataFrame:")
print(gdf.head())

# Print basic information about the GeoDataFrame
print("\nGeoDataFrame Info:")
print(gdf.info())

# Create a sample map for one health measure (e.g., Current asthma)
fig, ax = plt.subplots(figsize=(7.5, 7.5))

# Filter for the specific measure and ensure data_value is numeric
gdf_asthma = gdf[gdf['measure'] == 'Current asthma among adults'].copy()
gdf_asthma['data_value'] = pd.to_numeric(gdf_asthma['data_value'], errors='coerce')

# Plot the asthma data
gdf_asthma.plot(column='data_value', 
                ax=ax, 
                legend=True, 
                legend_kwds={'label': 'Asthma Prevalence (%)', 'orientation': 'horizontal'},
                cmap='YlOrRd',
                missing_kwds={'color': 'lightgrey'})

# Add county boundaries
#gdf_asthma.dissolve(by='countyname').boundary.plot(ax=ax, color='black', linewidth=0.5)

# Add basemap
ctx.add_basemap(ax, crs=gdf_asthma.crs.to_string(), source=ctx.providers.OpenStreetMap.Mapnik)

# Set the extent to match the Detroit metro area
ax.set_xlim(gdf_asthma.total_bounds[0], gdf_asthma.total_bounds[2])
ax.set_ylim(gdf_asthma.total_bounds[1], gdf_asthma.total_bounds[3])

plt.title('Asthma Prevalence in Detroit Metro Area', fontsize=16)
ax.axis('off')

plt.tight_layout()
plt.show()

# Print some statistics for the asthma data
print("\nAsthma Statistics:")
print(f"Average asthma prevalence: {gdf_asthma['data_value'].mean():.2f}%")
print(f"Minimum asthma prevalence: {gdf_asthma['data_value'].min():.2f}%")
print(f"Maximum asthma prevalence: {gdf_asthma['data_value'].max():.2f}%")

# Print the number of census tracts per county
print("\nNumber of census tracts per county:")
print(gdf_asthma['countyname'].value_counts())
Successfully retrieved data

Available health measures:
['Frequent physical distress among adults'
 'Cancer (non-skin) or melanoma among adults'
 'Visited dentist or dental clinic in the past year among adults'
 'Cognitive disability among adults' 'Coronary heart disease among adults'
 'Current asthma among adults'
 'Current lack of health insurance among adults aged 18-64 years'
 'Lack of social and emotional support among adults'
 'Arthritis among adults' 'Obesity among adults'
 'Self-care disability among adults' 'Mobility disability among adults'
 'Short sleep duration among adults'
 'Frequent mental distress among adults'
 'Fair or poor self-rated health status among adults'
 'Lack of reliable transportation in the past 12 months among adults'
 'All teeth lost among adults aged >=65 years'
 'Taking medicine to control high blood pressure among adults with high blood pressure'
 'Colorectal cancer screening among adults aged 45–75 years'
 'Vision disability among adults'
 'Visits to doctor for routine checkup within the past year among adults'
 'Diagnosed diabetes among adults'
 'Mammography use among women aged 50-74 years'
 'Independent living disability among adults'
 'Hearing disability among adults' 'Stroke among adults'
 'Chronic obstructive pulmonary disease among adults'
 'Food insecurity in the past 12 months among adults'
 'Feeling socially isolated among adults'
 'High cholesterol among adults who have ever been screened'
 'High blood pressure among adults'
 'No leisure-time physical activity among adults'
 'Cholesterol screening among adults'
 'Housing insecurity in the past 12 months among adults'
 'Utility services shut-off threat in the past 12 months among adults'
 'Binge drinking among adults'
 'Received food stamps in the past 12 months among adults'
 'Current cigarette smoking among adults' 'Any disability among adults'
 'Depression among adults']

First few rows of the GeoDataFrame:
                                             measure low_confidence_limit  \
0            Frequent physical distress among adults                 11.9   
1         Cancer (non-skin) or melanoma among adults                 10.6   
2  Visited dentist or dental clinic in the past y...                 68.9   
3                  Cognitive disability among adults                 14.1   
4                Coronary heart disease among adults                  7.5   

  data_value_unit data_value            short_question_text statedesc  \
0               %       13.2     Frequent Physical Distress  Michigan   
1               %       11.7  Cancer (non-skin) or Melanoma  Michigan   
2               %       71.7                   Dental Visit  Michigan   
3               %       15.7           Cognitive Disability  Michigan   
4               %        8.3         Coronary Heart Disease  Michigan   

  totalpop18plus   locationid countyname  year  ...   data_value_type  \
0           3015  26099206700     Macomb  2022  ...  Crude prevalence   
1           2972  26099241500     Macomb  2022  ...  Crude prevalence   
2           1250  26099240801     Macomb  2022  ...  Crude prevalence   
3           2545  26099255100     Macomb  2022  ...  Crude prevalence   
4           3423  26099254500     Macomb  2022  ...  Crude prevalence   

  data_value_footnote_symbol locationname         category datavaluetypeid  \
0                       None  26099206700    Health Status          CrdPrv   
1                       None  26099241500  Health Outcomes          CrdPrv   
2                       None  26099240801       Prevention          CrdPrv   
3                       None  26099255100       Disability          CrdPrv   
4                       None  26099254500  Health Outcomes          CrdPrv   

   measureid countyfips datasource totalpopulation                    geometry  
0      PHLTH      26099      BRFSS            3767  POINT (-83.00447 42.80593)  
1     CANCER      26099      BRFSS            3537  POINT (-82.96043 42.56028)  
2     DENTAL      26099      BRFSS            1451  POINT (-82.92256 42.57792)  
3  COGNITION      26099      BRFSS            3109  POINT (-82.91310 42.51732)  
4        CHD      26099      BRFSS            4289  POINT (-82.93637 42.53459)  

[5 rows x 24 columns]

GeoDataFrame Info:
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 46560 entries, 0 to 46559
Data columns (total 24 columns):
 #   Column                      Non-Null Count  Dtype   
---  ------                      --------------  -----   
 0   measure                     46560 non-null  object  
 1   low_confidence_limit        46560 non-null  object  
 2   data_value_unit             46560 non-null  object  
 3   data_value                  46560 non-null  object  
 4   short_question_text         46560 non-null  object  
 5   statedesc                   46560 non-null  object  
 6   totalpop18plus              46560 non-null  object  
 7   locationid                  46560 non-null  object  
 8   countyname                  46560 non-null  object  
 9   year                        46560 non-null  object  
 10  high_confidence_limit       46560 non-null  object  
 11  categoryid                  46560 non-null  object  
 12  stateabbr                   46560 non-null  object  
 13  data_value_footnote         0 non-null      object  
 14  data_value_type             46560 non-null  object  
 15  data_value_footnote_symbol  0 non-null      object  
 16  locationname                46560 non-null  object  
 17  category                    46560 non-null  object  
 18  datavaluetypeid             46560 non-null  object  
 19  measureid                   46560 non-null  object  
 20  countyfips                  46560 non-null  object  
 21  datasource                  46560 non-null  object  
 22  totalpopulation             46560 non-null  object  
 23  geometry                    46560 non-null  geometry
dtypes: geometry(1), object(23)
memory usage: 8.5+ MB
None

Asthma Statistics:
Average asthma prevalence: 12.06%
Minimum asthma prevalence: 7.20%
Maximum asthma prevalence: 17.20%

Number of census tracts per county:
countyname
Wayne      583
Oakland    345
Macomb     236
Name: count, dtype: int64

Interpolate a Surface w/ IDW

from scipy.interpolate import griddata
from rasterio.transform import from_origin
from rasterio.warp import transform_bounds

# Assuming gdf_asthma is already created and contains the asthma data

# Ensure gdf_asthma is in EPSG:4326
gdf_asthma = gdf_asthma.to_crs(epsg=4326)

# Extract coordinates and values
X = gdf_asthma.geometry.x.values
Y = gdf_asthma.geometry.y.values
Z = gdf_asthma['data_value'].values

# Remove any NaN values
mask = ~np.isnan(Z)
X, Y, Z = X[mask], Y[mask], Z[mask]

# Create a grid to interpolate over
grid_resolution = 0.025  # in degrees (smaller for better resolution)
x_min, y_min, x_max, y_max = gdf_asthma.total_bounds
grid_x = np.arange(x_min, x_max, grid_resolution)
grid_y = np.arange(y_min, y_max, grid_resolution)
grid_xx, grid_yy = np.meshgrid(grid_x, grid_y)

# Perform IDW interpolation
points = np.column_stack((X, Y))
grid_z = griddata(points, Z, (grid_xx, grid_yy), method='linear')

# Create an xarray Dataset from the interpolated data
ds = xr.Dataset({
    'asthma': (['y', 'x'], grid_z),
    'x': grid_x,
    'y': grid_y
})

# Set the correct CRS
ds.rio.write_crs("EPSG:4326", inplace=True)

# Fetch Detroit metro area counties using pygris
metro_counties = pygris.counties(state="MI", year=2022)
detroit_metro = metro_counties[metro_counties['NAME'].isin([
    'Wayne', 'Oakland', 'Macomb'])]

# Dissolve the counties into a single polygon
detroit_metro = detroit_metro.dissolve(by='STATEFP')

# Ensure detroit_metro is in EPSG:4326
detroit_metro = detroit_metro.to_crs(epsg=4326)

# Clip the dataset using the detroit_metro polygon
ds_clipped = ds.rio.clip(detroit_metro.geometry, drop=False)

# Reproject to Web Mercator (EPSG:3857)
ds_3857 = ds_clipped.rio.reproject("EPSG:3857")

# Create the plot
fig, ax = plt.subplots(figsize=(7.5, 7.5))

# Plot the interpolated and clipped data
im = ds_3857.asthma.plot(ax=ax, cmap='YlOrRd', 
                         alpha=0.7, add_colorbar=False)

# Add colorbar with adjusted height
cbar = plt.colorbar(im, ax=ax, label='Asthma Prevalence', 
                    fraction=0.047, pad=0.04, aspect=20)

# Add basemap
ctx.add_basemap(ax, crs=ds_3857.rio.crs, 
                source=ctx.providers.OpenStreetMap.Mapnik)

# Set the extent to match the Detroit metro area
bounds_3857 = transform_bounds("EPSG:4326", "EPSG:3857", 
                               x_min, y_min, x_max, y_max)
ax.set_xlim(bounds_3857[0], bounds_3857[2])
ax.set_ylim(bounds_3857[1], bounds_3857[3])

plt.title('IDW Interpolated Asthma Prevalence in Detroit Metro Area', fontsize=16)
ax.axis('off')

plt.tight_layout()
plt.show()
Using FIPS code '26' for input 'MI'

Data Review

The Toxics Release Inventory (TRI) and the Integrated Compliance Information System for Air (ICIS-AIR) are two important but distinct environmental reporting systems maintained by the U.S. Environmental Protection Agency (EPA). They have several key differences:

  • Regulatory Basis
    • TRI: Established under the Emergency Planning and Community Right-to-Know Act (EPCRA) of 1986
    • ICIS-AIR: Part of the Clean Air Act (CAA) compliance and enforcement program
  • Focus
    • TRI: Tracks the management of certain toxic chemicals that may pose a threat to human health and the environment
    • ICIS-AIR: Focuses specifically on air quality and emissions from facilities regulated under the Clean Air Act
  • Reported Information
    • TRI: Facilities report on releases, waste management, and pollution prevention activities for specific toxic chemicals
    • ICIS-AIR: Tracks emissions data, compliance status, and enforcement actions related to air quality regulations
  • Facility Coverage
    • TRI: Covers facilities in specific industries that manufacture, process, or use TRI-listed chemicals above certain thresholds
    • ICIS-AIR: Includes a broader range of facilities that emit air pollutants, regardless of the specific chemicals involved
  • Reporting Thresholds
    • TRI: Has specific chemical thresholds that trigger reporting requirements
    • ICIS-AIR: Generally doesn’t have chemical-specific thresholds; requirements are based on overall emissions and facility type
  • Public Accessibility
    • TRI: Designed with a strong focus on public right-to-know, with data easily accessible to the public
    • ICIS-AIR: While public, it’s primarily designed for regulatory and enforcement purposes
  • Data Frequency
    • TRI: Annual reporting is required for covered facilities
    • ICIS-AIR: May involve more frequent reporting, depending on permit requirements and compliance status
  • Scope of Pollutants
    • TRI: Focuses on a specific list of toxic chemicals and chemical categories
    • ICIS-AIR: Covers a wider range of air pollutants, including criteria air pollutants and hazardous air pollutants
  • Use in Environmental Management
    • TRI: Often used for assessing long-term trends in toxic chemical releases and waste management practices
    • ICIS-AIR: More commonly used for day-to-day air quality management and enforcement activities
  • Geographic Coverage
    • TRI: Nationwide program with consistent reporting across states
    • ICIS-AIR: While national, implementation can vary more by state or local air quality management district

Deployment

Congratulations! …. Now you should be able to:

  • Test test…

Lesson 3

In this lesson, we explored ….

Lesson 3